Search CORE

259 research outputs found

Performance of customized DCT quantization tables on scientific data

Author: Livny Miron
Ratnakar Viresh
Publication venue
Publication date
Field of study

We show that it is desirable to use data-specific or customized quantization tables for scaling the spatial frequency coefficients obtained using the Discrete Cosine Transform (DCT). DCT is widely used for image and video compression (MP89, PM93) but applications typically use default quantization matrices. Using actual scientific data gathered from divers sources such as spacecrafts and electron-microscopes, we show that the default compression/quality tradeoffs can be significantly improved upon by using customized tables. We also show that significant improvements are possible for the standard test images Lena and Baboon. This work is part of an effort to develop a practical scheme for optimizing quantization matrices for any given image or video stream, under any given quality or compression constraints

NASA Technical Reports Server

Method and system for data clustering for very large databases

Author: Livny Miron
Ramakrishnan Raghu
Zhang Tian
Publication venue
Publication date: 03/11/1998
Field of study

Multi-dimensional data contained in very large databases is efficiently and accurately clustered to determine patterns therein and extract useful information from such patterns. Conventional computer processors may be used which have limited memory capacity and conventional operating speed, allowing massive data sets to be processed in a reasonable time and with reasonable computer resources. The clustering process is organized using a clustering feature tree structure wherein each clustering feature comprises the number of data points in the cluster, the linear sum of the data points in the cluster, and the square sum of the data points in the cluster. A dense region of data points is treated collectively as a single cluster, and points in sparsely occupied regions can be treated as outliers and removed from the clustering feature tree. The clustering can be carried out continuously with new data points being received and processed, and with the clustering feature tree being restructured as necessary to accommodate the information from the newly received data points

NASA Technical Reports Server

Multiclass Query Scheduling in Real-Time Database Systems

Author: CAREY Michael J.
LIVNY Miron
PANG Hwee Hwa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1995
Field of study

In recent years, a demand for real-time systems that can manipulate large amounts of shared data has led to the emer-gence of real-time database systems (RTDBS) as a research area. This paper focuses on the problem of scheduling queries in RTDBSs. We introduce and evaluate a new algorithm called Priority Adaptation Query Resource Scheduling (PAQRS) for handling both single class and multiclass query workloads. The performance objective of the algorithm is to minimize the number of missed deadlines, while at the same time ensuring that any deadline misses are scattered across the different classes according to an administratively-defined miss distribution. This objective is achieved by dynamically adapting the system’s admission, mem-ory allocation, and priority assignment policies according to its current resource configuration and workload characteristics. A series of experiments confirms that PAQRS is very effective for real-time query scheduling

CiteSeerX

Institutional Knowledge at Singapore Management University

ScholarBank@NUS

Transactional Client-Server Cache Consistency: Alternatives and Performance

Author: Carey Michael J.
Franklin Michael J.
Livny Miron
Publication venue
Publication date: 15/10/1998
Field of study

Client-server database systems based on a page server model can exploit client memory resources by caching copies of pages across transaction boundaries. Caching reduces the need to obtain data from servers or other sites on the network. In order to ensure that such caching does not result in the violation of transaction semantics, a cache consistency maintenance algorithm is required. Many such algorithms have been proposed in the literature and, as all provide the same functionality, performance is a primary concern in choosing among them. In this paper we provide a taxonomy that describes the design space for transactional cache consistency maintenance algorithms and show how proposed algorithms relate to one another. We then investigate the performance of six of these algorithms, and use these results to examine the tradeoffs inherent in the design choices identified in the taxonomy. The insight gained in this manner is then used to reflect upon the characteristics of other algorithms that have been proposed. The results show that the interactions among dimensions of the design space can impact performance in many ways, and that classifications of algorithms as simply Pessimistic" or Optimistic" do not accurately characterize the similarities and differences among the many possible cache consistency algorithms. (Also cross-referenced as UMIACS-TR-95-84

Digital Repository at the University of Maryland

New science on the Open Science Grid

Author: Altunay Mine
Avery Paul
Bejan Alina
Blackburn Kent
Blatecky Alan
Gardner Rob
Kramer Bill
Livny Miron
McGee John
Olson Doug
Pordes Ruth
Potekhin Maxim
Quick Rob
Roy Alain
Sehgal Chander
Wenaus Torre
Wilde Mike
Würthwein Frank
Publication venue: 'AIP Publishing'
Publication date: 18/08/2008
Field of study

The Open Science Grid (OSG) includes work to enable new science, new scientists, and new modalities in support of computationally based research. There are frequently significant sociological and organizational changes required in transformation from the existing to the new. OSG leverages its deliverables to the large-scale physics experiment member communities to benefit new communities at all scales through activities in education, engagement, and the distributed facility. This paper gives both a brief general description and specific examples of new science enabled on the OSG. More information is available at the OSG web site: www.opensciencegrid.org

Caltech Authors

Flexible Session Management in a Distributed Environment

Author: Dan Bradley
Douglas Thain
Foster I
Freier A O
Igor Sfiligoi
Igor Sfiligoi
Keith Brown
Michael Litzkow
Miron Livny
Rajesh Raman
Ronald L Rivest
Schneier B
Steiner J G
Todd Tannenbaum
US National Institute of Standards
US National Institute of Standards
Zach Miller
Publication venue: 'IOP Publishing'
Publication date: 01/01/2010
Field of study

Many secure communication libraries used by distributed systems, such as SSL, TLS, and Kerberos, fail to make a clear distinction between the authentication, session, and communication layers. In this paper we introduce CEDAR, the secure communication library used by the Condor High Throughput Computing software, and present the advantages to a distributed computing system resulting from CEDAR's separation of these layers. Regardless of the authentication method used, CEDAR establishes a secure session key, which has the flexibility to be used for multiple capabilities. We demonstrate how a layered approach to security sessions can avoid round-trips and latency inherent in network authentication. The creation of a distinct session management layer allows for optimizations to improve scalability by way of delegating sessions to other components in the system. This session delegation creates a chain of trust that reduces the overhead of establishing secure connections and enables centralized enforcement of system-wide security policies. Additionally, secure channels based upon UDP datagrams are often overlooked by existing libraries; we show how CEDAR's structure accommodates this as well. As an example of the utility of this work, we show how the use of delegated security sessions and other techniques inherent in CEDAR's architecture enables US CMS to meet their scalability requirements in deploying Condor over large-scale, wide-area grid systems

arXiv.org e-Print Archive

Crossref

UNT Digital Library

Validation of rice genome sequence by optical mapping

Author: Bechner Michael C
Churas Chris P
Forrest Dan K
Goldstein Steve
Leong Sally A
Livny Miron
Pape Louise
Place Michael
Runnheim Rod
Schwartz David C
Zhou Shiguo
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Rice feeds much of the world, and possesses the simplest genome analyzed to date within the grass family, making it an economically relevant model system for other cereal crops. Although the rice genome is sequenced, validation and gap closing efforts require purely independent means for accurate finishing of sequence build data. Results To facilitate ongoing sequencing finishing and validation efforts, we have constructed a whole-genome SwaI optical restriction map of the rice genome. The physical map consists of 14 contigs, covering 12 chromosomes, with a total genome size of 382.17 Mb; this value is about 11% smaller than original estimates. 9 of the 14 optical map contigs are without gaps, covering chromosomes 1, 2, 3, 4, 5, 7, 8 10, and 12 in their entirety – including centromeres and telomeres. Alignments between optical and <it>in silico </it>restriction maps constructed from IRGSP (International Rice Genome Sequencing Project) and TIGR (The Institute for Genomic Research) genome sequence sources are comprehensive and informative, evidenced by map coverage across virtually all published gaps, discovery of new ones, and characterization of sequence misassemblies; all totalling ~14 Mb. Furthermore, since optical maps are ordered restriction maps, identified discordances are pinpointed on a reliable physical scaffold providing an independent resource for closure of gaps and rectification of misassemblies. Conclusion Analysis of sequence and optical mapping data effectively validates genome sequence assemblies constructed from large, repeat-rich genomes. Given this conclusion we envision new applications of such single molecule analysis that will merge advantages offered by high-resolution optical maps with inexpensive, but short sequence reads generated by emerging sequencing platforms. Lastly, map construction techniques presented here points the way to new types of comparative genome analysis that would focus on discernment of structural differences revealed by optical maps constructed from a broad range of rice subspecies and varieties.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Practical resource monitoring for robust high throughput computing

Author: Benjamin Tovar
Dariusz Król
Douglas Thain
Ewa Deelman
Gideon Juve
Miron Livny
Rafael Ferreira Da Silva
William Allcock
Publication venue
Publication date: 01/01/2014
Field of study

Abstract-Robust high throughput computing requires effective monitoring and enforcement of a variety of resources including CPU cores, memory, disk, and network traffic. Without effective monitoring and enforcement, it is easy to overload machines, causing failures and slowdowns, or underutilize machines, which results in wasted opportunities. This paper explores how to describe, measure, and enforce resources used by computational tasks. We focus on tasks running in distributed execution systems, in which a task requests the resources it needs, and the execution system ensures the availability of such resources. This presents two non-trivial problems: how to measure the resources consumed by a task, and how to monitor and report resource exhaustion in a robust and timely manner. For both of these tasks, operating systems have a variety of mechanisms with different degrees of availability, accuracy, overhead, and intrusiveness. We describe various forms of monitoring and the available mechanisms in contemporary operating systems. We then present two specific monitoring tools that choose different tradeoffs in overhead and accuracy, and evaluate them on a selection of benchmarks

CiteSeerX

The CMS Integration Grid Testbed

The CMS Integration Grid Testbed (IGT) comprises USCMS Tier-1 and Tier-2 hardware at the following sites: the California Institute of Technology, Fermi National Accelerator Laboratory, the University of California at San Diego, and the University of Florida at Gainesville. The IGT runs jobs using the Globus Toolkit with a DAGMan and Condor-G front end. The virtual organization (VO) is managed using VO management scripts from the European Data Grid (EDG). Gridwide monitoring is accomplished using local tools such as Ganglia interfaced into the Globus Metadata Directory Service (MDS) and the agent based Mona Lisa. Domain specific software is packaged and installed using the Distrib ution After Release (DAR) tool of CMS, while middleware under the auspices of the Virtual Data Toolkit (VDT) is distributed using Pacman. During a continuo us two month span in Fall of 2002, over 1 million official CMS GEANT based Monte Carlo events were generated and returned to CERN for analysis while being demonstrated at SC2002. In this paper, we describe the process that led to one of the world's first continuously available, functioning grids.Comment: CHEP 2003 MOCT01

arXiv.org e-Print Archive

Caltech Authors

CERN Document Server